Why Java Objects only on Heap?
Last month I had asked a Java related question to all my students on a facebook post.
- Why is it that in Java, Objects are always to be allocated on heap?
- Why is dynamic allocation the only option in the toolbox of a Java developer while creating objects?
It was a very interesting discussion on facebook. Many participants, my students and otherwise gave good arguments in defense of the Java Language Design. One of the arguments that came out shining, was that of the possibility of unsafe allocations on Stack Frame. Special thanks to Ted Lum, Chetan Parmar, Nishant Verma, Vaibhav Desai, Tusharkumar Thomar, Vimal Sakhiya, Clair Roma Henry, Subham Gadi, Makarand Bhosle, Prasad Kowli and Nirav Kothari for their participation and valuable arguments.Let me now, put forward my perception on why Java Language Designers would have taken such a decision.
The following content in this article assumes that the reader is aware of memory models and life cycles of various types of variables in C++ and Java. To appreciate certain aspects of this article, programming experience is needed. This article is not a tutorial, it's an attempt to argue over certain language decisions made by the Java Language Designers. This article is not about favouring one style of design over the other. It’s about exploring the design forces that Language Designers have to evaluate when designing their language and defining its persona.
In languages like C++ an object can be allocated
- on Heap Region using the new operator
- on Stack Region as a local variable.
- on Data Region as a global variable.
In Java the only option is that of dynamic allocation on Heap. Java doesn’t have Data Region so C++ style global objects are not possible. That leaves us with the argument.
- Why is it that in Java, Objects are not allocated on Stack Region ? * at least at development time
- Why is it that in Java, Objects can’t be created as local variables or passed as By-Value parameters in a method?
When a method is invoked a special block of memory called an Activation Record or Frame is allocated usually in the region of memory referred to as the Stack Region ( *JVM specification allows Activation Records to be allocated on heap ).
The Activation Record among other things have the following components.
- Parameters passed to the invoked method
- Local variables needed in the invoked method
- Return address
- Operand Stack.
Although the Activation Record is created at Runtime. Prologue, the code responsible for creation of Activation Record is baked in the binary ( executable ) at compile time. Hence the structure and size of Activation Record is already finalised at Compile time.
Local Objects --- Unsafe Allocation
Activation Record is created with method invocation and gets deleted when the method’s execution completes. All variables allocated in the Activation Record live for the duration of the method’s execution. An object created as a local variable of a method will be allocated on the method's Activation Record, and would die with the flow of control returning from the method. Meanwhile, during the execution of the method, its quite possible that some global reference gets bound to the object. Even, a local reference to the object can get returned to the invoking method ( call site ). Problems can arise from the above mentioned reference bindings. Anyone with such a reference will be dealing with a dangling reference, as the object gets deleted with the completion of method execution. This indicates that the Frame allocation of objects is potentially unsafe.
No Pointer/Reference to Local Objects --- Solution?
How about allocating local objects and not allowing their binding with any pointer or reference. Can this solve the problem of Unsafe allocation? Let’s categorise this local object as Value Type Object with no bindings. The Reference Type Heap Object can be allocated on heap with all it’s regular facilities ( references etc ) and the Value Type Local Object would be limited in its flexibilities. The parameter passing of local objects as arguments and return of local objects would be in pass By-Value semantics. What would be the implication of such a design decision?
Local Objects Without Pointers --- Implications?
C++ allows local objects. It allows pointers and references to local objects. In C++ objects can be passed by value, by ref and also via pointers. We will keep C++ in mind during the following discussion.
Slicing, Substitution, Polymorphism
In Liskov’s principle of substitution a Child object can act as a substitute for a Parent object. Any method that has a parameter of type Parent, can accept objects of both the Parent and Child classes. However, If the parameter passing is done using By Value semantics, and a Child object is passed to the method, all the data in the Child object ( actual parameter ) can’t be stored in the Parent object ( formal parameter ). The Child object gets sliced while being passed to the method, and only a certain portion of its state is copied in the Parent Object. Furthermore, the method invocations on Parent Object will alway bind to behaviour in Parent Class, inspite of method being overriden in Child Class.
This means that for a local object with it's state allocated on the Activation Record, the substitution principle works, but overriding and polymorphism are inhibited.
Alternatively, if the above mentioned parameter is passed By Ref or via By Pointer semantics, then not only is substitution allowed but polymorphic behaviour is also preserved.
Baking Local Objects at Compile Time
The reason why it works in By Ref/Pointer and not in By Value has a lot to do with Activation Record. The structure and size of Activation Record is finalised at compile time.
- By-Value Semantics The Activation Record of the above mentioned method will have to accommodate complete state of Parent Object in By Value semantics. The Prologue responsible for construction of Activation Record will only provision for the space required by Parent Object. Even if we pass child object the method's Activation Record will not be able to accommodate it. Slicing will takes place and using overridden method on sliced object would be unsafe due to absence of relevant data members*.* even if virtual pointer or table would be available the relevant state on which to work is absent
- By Ref/Pointer However in By Ref/Pointer scenario the Prologue creating the Activation Record will only provision for the Pointer or Reference information in the Activation Record. The actual object ( argument ) is now either on heap or data region or in some other Activation Record in the Stack Region. With the help of virtual pointer in the object and corresponding virtual table entry at class level, polymorphic behaviour can be achieved.
Local Objects --- Benefits
On the positive side, the allocation of objects in the Activation Record is considered more efficient. This is because the creation and deletion points for objects are well defined in the execution flow and would not need a mechanism like garbage collector for deletion. Further, if we assume that each thread has its own private Stack Region,then the local objects would be visible only to a single thread, as is the case in Java. Of course, here we assume that the references are not leaked outside the thread scope or the Objects are Value Type objects and they don’t need or can’t have references. With this assurance that objects are thread private, thread safety for such objects would be inherent and would not need monitors.
Java Has Local Objects ?
Java doesn’t allow creation of local objects at development time. However, the efficiency and inherent thread safety obtained by allocating objects in Activation Record has inspired the Java dynamic compiler designers to adopt escape analysis as a part of compiler optimisation phase. This analysis inspects the object’s creation and usage to find out if the object is escaping the method boundary. If the object is solely being used and referred to from within the method, the object can be safely allocated in the Frame. It’s almost like the dynamic compiler saying to the developer that
“You allocate objects dynamically on the heap and don’t bother about allocating objects on the Stack Frame. I will do it if deemed necessary and seen as safe”.
It seems that the Java designers are not really against having objects in Frame, but are just worried about the unsafe operations that can result from it. Is it this concern that prompted the Java language designers to bypass the possible unsafe operation of having local object at developer’s level of program abstraction? Or are there other arguments?
Question of Modularity --- Unwarranted Recompilation
In any good modular design, the Modules interact with each other only through their interfaces. Interface of a Module is its public contract with other modules. If the implementation of any given module changes, it doesn’t affect other modules so long as its interface has not changed. In this context, systems supporting modular design must ensure that, if a module is changed internally and gets recompiled, other modules dependent on its unchanged public contract must not be forced to be recompiled.
Local Objects --- Limits Modularity
Now let us explore the dimension of Modularity with respect to local objects. Let’s see how local objects ( on Activation Record ) constraint the flexibility in software development by forcing recompilations across source code modules even if their public contracts are not violated. This argument becomes more important in case of Java, as it supports Network Dynamic Loading.
Once again, I will take C++ as an example language to establish this argument.
Let’s look at the following code example. The program is being analysed in stages.
Let’s look at the following code example. The program is being analysed in stages.
In Stage1 we observe that an object pt1 of class Point is created and passed By-Value to a function named target. The target function copies pt1 in its local object p. The Activation Record of target function will contain this local object p. The prologue created by compiler for creation of the Activation Record of target will have the necessary information of memory required by object of class Point. This prologue is invoked when the main calls target function. Everything is fine in this scenario.
Let’s make some changes in class Point. We will not change its public interface, but will change some internal behaviour. The public interface will be what the client of Point needs to know to consume its services. In Point class this comprises of signature of public member function print() for usage and the public default constructor for creation. We change the behaviour of print() member function by adding a couple of printf calls.
We can now compile the Point.cpp file to generate a new Point.o file. However, we don’t re-compile Main.cpp as the public interface of Point has not changed. If we now link the new Point.o file with old Main.o file to generate the Final executable, will the code run successfully? Will the new behaviour of print() be called?
Point.cpp ( behaviour modified )
As we see that the output confirms that recompilation of Main.cpp wasn’t essential to get the expected output.
Now let’s change the layout of class Point. Note that the instance variables in Point class are private and hence don’t form the public interface of Point. We introduce a variable z in the Point class. We update the constructor and print function accordingly. We recompile Point.cpp class to get Point.o. Now based on the assertion that public interface has not changed, we don’t re-compile Main.cpp. Would you now expect the old Main.o to successfully link with the new Point.o? If the linking is successful will the program run successfully?
Point.h ( added instance variable z )
( changes made in Constructor and print member function ) public interface intact.
Creating a local object pt1 of Point class
Constructor of Point invoked
Passing pt1 as argument By-Value to target
Entering print of Point
x = 1
y = 2
z = 348859056
Leaving print of Point
If you have look at the output we see that value of z is not as we expect. We know that the latest print function was called as z is being printed, but why is then the value of z looks like garbage? The reason is that when we originally compiled Main.cpp there were only two instance variables. The prologue for the call to target was created at compile time and had this notion of Point object having two integers. We changed Point’s structure and now the object of Point class is having three instance variables. There is a mismatch here. The Prologue still thinks that Point class has two variables and creates an Activation Record for target accordingly. However, print function of Point class knows about the latest changes and tries to print x , y and z. Whereas in Activation Record we have only x and y. The problem can easily be solved by re-compiling Main.cpp. Nevertheless the question here is that if we didn’t change the public interface of Point then why were we forced to re-compile Main.cpp? The conclusion is that even though explicitly Main depends on public interface of Point, internally it is coupled with the implementation of class Point. This is against the principle of Modularity. It is understandable that we didn’t have to change anything in Main but we had to re-compile it and that is unwarranted.
We could have avoided re-compilation and increased modularity of overall design by choosing pass By-Ref or Pointer for target function and allocating object on heap. To check this let us roll back Point class to Stage 2 where it had no z. Let us modify Main.cpp so that it starts using Pointer as parameter for target. The object of Point will now be allocated on heap and bound to the ppt1 pointer in main. We will pass this pointer to target function.
Point.cpp ( Rolling it back to Stage 2 )
Main.cpp ( heap allocation and passing pointer to target )
The output is normal and as we see the Main.o object file has be compiled against the version of Point class that has only two instance variables.
Now we once again introduce instance variable z in Point class and make necessary changes in constructor and print() member function. We compile Point.cpp to obtain Point.o , but we don’t compile Main.cpp to obtain a new Main.o file. So now can we link the old Stage 4 Main.o to Stage 5 Point.o? If yes then will we get the desired output? Once again the argument is that nothing has changed in the public interface of Point class hence , we don’t need to change Main.cpp.
This output proves that the old Main.o could successfully link with new Point.o and the new Final gave the desired output. Getting the right output without re-compiling Main.cpp was the test of modularity. The public interface had not changed and hence the old compiled code was expected to run against the new code. We succeeded in this effort because we allocated the object on Heap. The Activation Record only had the pointer to the object. Real modularity was obtained with allocation of object on Heap. So to write modular programs in C++ allocation of objects on heap and passing By Pointers/By Ref to functions are more appropriate semantics.
The above mentioned problem would become more serious in Java as Java supports network class loading. Suppose Java was supporting local objects then any change in the service provider would warrant re-compilation of Consumer class. This would happen even though the public interface of service provider would remain same. To address this issue of Unwarranted re-compilations and increase modularity, Java must have decided to go for Heap only allocation of Objects.
I have now proposed my arguments speculating the design reasons for Heap Only allocation of Java Objects. I would like to hear about your arguments and perceptions. Thanks for your patience.
copyright ©Rajesh Patkar, All rights reserved.
copyright ©Rajesh Patkar, All rights reserved.