Thursday, July 19, 2007

Two Cousins - String and StringBuffer

Introduction
If at all Strings exists then why do we need StringBuffer?
Untill now I pacified myself stating: its just a version of String with inbuilt buffer and dynamic size ability while compromising immutability and loosing its position of datatype.
But things changed when I started to optimize my code, and studied the relationship String and StringBuffer share.
Lets first discuss capabilities of each saperately to understand them and the relation they share.

String
String is one of the most powerful thing Java provided to programmers, it is a immutable object and treated as a datatype. Also it is the only object witch allows binary operation '+' and the assingnment operation '+='.
But still it is the Whipping Boy of Java performace, the reson behind is clear, it uses internally a char array to represent a String (witch takes heap memory, while all other datatype variable are created on stack) but there is no way to avoide Strings. Though there are things like String pool provided by JVM to share strings (internally by JVM not visible to user) but still '+' and '+=' costs a lot as each binary opration requires 3 string objects to be created (thus object creation cost as well as heap consumption).

StringBuffer
Its just like a String with an underlaying char array to store the string, but with dynamic extandability and utility methods like append, delete, insert and replace. With an option to define initial arraysize to reduce arraycopy calls. And to add to its stack - it is thread safe.
So are you getting something, its just like String but with methods to manipulate them more efficiently. As StringBuffer does not return a new objec for each manipulation operation like any append call will require only one StringBuffer object.

The Cousin Relationship
Use StringBuffer in manipulation and then use toString method to get the final String.
Well by now we all are smart enough to find out that, but wait a second its just a normal behaviour of any object's toString method where is the cousin relationship, do they really have Blood relation.
Yup they do share Blood, StringBuffers toString method returns a String object witch share the same underlaying array that of the StringBuffer, with a flag shared set to true. Thus no conversion cost is involved form converting a StirngBuffer to Stirng object.
They share the same char array till any other manipulation operation is performed on StirngBuffer, witch leads in creation of new underlaying array for StringBuffer object and invoking a System.arrayCopy method. The String object does not change and continue to point to the previous internal char array.

How does it happens
When toString method is called on StringBuffer it returns a new string as follows
return new String(this);

Thus calling constructor of String taking StringBuffer as argument, witch looks like this
public String (StringBuffer buffer) {
synchronized(buffer) {
buffer.setShared();
this.value = buffer.getValue();
this.offset = 0;
this.count = buffer.length();
}
}

Where setShared method set the StringBuffer's shared flag to true, witch in handled in every String manipulation opration in StringBuffer as
if (shared) copy();

Witch creates a new copy of char array and set StringBuffers storage pointer to it. (see copy methods code below)
private final void copy() {
char newValue[] = new char[value.length];
System.arraycopy(value, 0, newValue, 0, count);
value = newValue;
shared = false;
}

Singleton Pattern - "Guaranteed Single Object Model"

Introduction:
This blog only deals with implementing Singleton Pattern in Java.

Why? Whats the need?
Before we start, we must first understant the need of "single object model".
Say you want to access the printer/stream or any other resource then one would like to have a single handler for that resource to have greater controlling on its operations.
Or say you have an application accessing database, is it fine if all the modules of application access database with their own driver, in such conditions the singleton pattern is best suited.

Design Pattern and Singleton Pattern
Before we start our journy to singleton, lets first have a brife introduction about Design Patterns.
"Design Patterns are some guidelines, ganral rules, solution templates for common recurring problems"
And Singleton Pattern is a Creational Design Pattern - as it address and control cration of the object in question.
Definition: "A Design Pattern in witch at any time there is exactly one object/handler for any Class/Module"

Implementing Singleton in Java
According to definition for any class to be singleton there must not be more than one instance at any time, so first of all we must find a way to restrict users to create more instances of the object. This could be done my just making a private constructor, and providing a static method to access the instance of the object.

public class MySingleton{
private static MySingleton instance=null;
private MySingleton(){
//no code required - or any other initialisation code
}
public static MySingleton getInstance(){
if(instance==null){
instance=new MySingleton();
}
return instance;
}
//any other methods or code
}

So now no other class can create objects of this class and there can't be more than one instance, as we guarented in getInstance method. So is the job done? well yes and no - yes as it fulfills the definition and promery requirements and No as threading may create problems, and may lead to more than one instances.
Ohh thats now pretty easy, yeah we all know how to restrict threads to ruin all we have done, we all are smart people ;)
so just add synchronized keyword in the getInstance method, so our method signature becomes

public static synchronized MySingleton getInstance()

Well is it done yet? - Nop still one thing is there to trouble us, what if someone, someone having personal problem with your code ;), tries to make clone of your object, well you haven't given any clone method, but it is already there, remember Object.clone, and worst of all it has protected access so any subclass can be written and clone can be overridden, so just to avoid it add one clone method also.

So now we have our Singleton class with guaranted single object modle. Complete code for the class is below.

public class MySingleton{
private static MySingleton instance=null;
private MySingleton(){
//no code required - or any other initialisation code
}
public static synchronized MySingleton getInstance(){
if(instance==null){
instance=new MySingleton();
}
return instance;
}
public Object clone() throws CloneNotSupportedException{
throw new CloneNotSupportedException();
}
//any other methods or code
}

Using Volatile keyword efficiently

Java gives you a powerful keyword to address concurrent execution environments without writing your own mutual exclusion algorithm to monitor concurrent thread execution.


Though I must remind, this does come with some overheads and loss in efficiency, so before using volatile keyword one must fully understand the use, advantages and disadvantages of using volatile keyword.

What is Volatile?

Volatile is a Java modifier witch is a hint to the compiler about nature of the variable. It tells the compiler that given variable can be changed by something beyond the control of the program. It tells compiler to handle this variable differently than normal variables.

What different handling of Volatile variables done in Java?

When a variable is defined volatile, Java compiler and JVM do two things differently.

1. Compiler does not optimize the expressions and statements involving that variable.

2. Synchronization (concurrent execution with mutual exclusion) of the variable is done by JVM.

When to use Volatile and when to design your own monitor?

Now as we know volatile gives a hint to compiler that given variable can be changed by something beyond the control of the program, so then why Sun provided synchronization keyword.
As volatile keyword is handled by compiler and JVM, it involves some overheads and we don’t have control over handling of this critical section problem.

Thus when to use volatile and when to use custom monitors can be answered in this way:
Use Volatile keyword whenever the variable represents some type of system or environment dependent variable.
For Example: clocks and simple counters.

And use custom monitors for those variables witch are changed by some other code and not by the environment, or for those variables for witch you want greater control on concurrent execution.
For example: Linked List, Queues etc.


Tuesday, July 17, 2007

Java Code Optimizing Techniques

We have basic code optimization techniques, JIT and adaptive compilers to optimize, then why we need Java Code Optimization?
After using general optimization also there are many small language dependent things that consumes resources and can have catastrophic effect on our code.
And to understand why we need optimization we must know about what cost is associated with each operation we perform.
Cost associated with operations (Approx, where 1 unit minimum cost associated with any expression)

  1. Variable Declaration - 1 Unit
  2. Assignment Operation - 1 Unit
  3. Method Calls - 3 to 30 Units
  4. Object Invocation - 3000 to 5000 Units
  5. Broadening of Object - 1000 to 3000 Units
  6. Overloaded/Overridden Methods Calls - 500 to 3000 Units

Above figures just gives an idea about how much resources (time/memory etc.) we consume in a simple code.
To better concur the optimization problem we will discuss each bottleneck one by one, to get rid of them and how to tradeoff to improve performance.
Lets start with our largest enemy.
Object Creation
Problems Associated with Objects is

  • Excessive Memory Usage
  • Time Consumed For Creation
    • Memory Cycling by GC (Threshing)
    • Whole Object Graph Creation (All super classes must also be initialized)
  • Memory Leak - if not freed up.

Object Creation Guidelines

  1. Avoid creating objects in frequently used routines; possibly pass in reusable objects as parameters.
  2. Avoid creating too many object members, as they are instantiated at the time of creation of object.
  3. Pre-size collections - this could give significant performance boost, try to initialize a collection slightly larger than required on average basis.
  4. Reuse objects - thus avoiding new object creation and also garbage collection.
  5. Avoid intermediate object creation. Example -
    • Expression int x = new Integer (str).intValue(), where str="123"
    • Can be written as int x = Integer.parseInt(str).
  6. Avoid creation of objects using re-initialization, by reusing existing objects or cloning objects from existing ones.
  7. Lazy initialization - you may define objects in advance but try avoid instantiation of object until they are really required.

Second big monster family to trouble us - String and char array associated with each String object

Optimizing Strings
Problems with Strings

  1. String is immutable, thus no customizations can be done.
  2. Has underlying char array data store (a secondary object in Java), and usually Strings allocate more memory to them than required.
  3. Conversions to other Objects or types are costly.

But Sun did provided some great features to enhance performance

  1. Very efficient comparison methods, namely equals and equalsIgnoreCase, and they compare them by length and then from left to right.
  2. subString returns a new string object but they do share the same underlying array object just the start and offset location is changed.
  3. String has special relationship with StringBuffer class witch is more flexible and conversion from later to first does not cost much as they share same internal array with just a flag, shared, indicating that array is shared.

Okay we know it now, but how this could help us?
If we keep these in mind we can write code more efficient for example
from point 1: - While comparing two String tokens (of same length) the equals (or equalsIgnoreCase) will return faster as early as the difference is, so we can choose our literals in that way. That is choosing "APPLY_RUN_SCHEDULE" and "CANCEL_RUN_SCHEDULE" than using "RUN_SCHEDULE_APPLY" and "RUN_SCHEDULE_CANCEL" for greater performance.
From point 3: - While concatenation of strings it is more efficient to use append method of StringBuffer class and then converting the result to String. That is
return "abc" + "def" + "ghi"
can be written as
return (new StringBuffer()).append("abc").append("def").append("ghi")
or
return (new StringBuffer("abc")).append("def").append("ghi")
as the conversion from StringBuffer to String is not costly (remember special relation).

Optimizing Exceptions

Try-Catch block itself does not add any cost to the overall performance, but if at all an exception occurs a significant change in performance can be observed, and mostly it is due to cost of tracing done by JVM for the stack trace. This cost can be in terms of hundreds or thousands units of normal code execution.
Question: How to optimize?
Answer: The best way to optimize is to reuse the exception caught to throw to next level of code so that stack trace is not populated again.
And also we should not catch a more general exception in catch clause if we already know the type of exception we could get, as this would add up object generalization cost to the exception handling.

Casting

Casting involves convertibility validation cost and conversion cost. Usually primitive casting does not involve any cost but casting any object to some class or interface is costly (while casting to interface is more costly).
Most of the time casting is required while using collections, but one can use type specified collection classes to avoid casting. Also to avoid Exceptions (witch are very costly) one can use instanceof operator before casting.

Optimizing Variables

Some tips to use variables efficiently are

  1. Local and argument variable are cheaper as they consume only 4 bytes on the stack.
  2. Arrays are costly as additional overhead is required for checking boundary condition.
  3. Integer calculations are faster than char, long, byte or short, as they needed to rollup to the integer for calculations.
  4. Long and double are twice slower as they are platform dependent.

Optimizing Loops and Conditional flows

  1. General optimization techniques (code motion and loop unrolling)
  2. Termination point or condition should not be a method.
  3. Index or counter should be an int (not an char, short, byte or long)
  4. Counting down is faster that is i++ is faster than i--
  5. Use Short-Circuit (&&, ||) operators to combine conditions efficiently.

These are most common Java optimization guidelines, there are still many areas where Java can be optimized for greater performance, and I would like to leave them for you to search new possibilities and to come out with your own optimizing guidelines.

Default Optimization by Javac engine

Sometimes you would have wondered why code is not working!!!

You just made some changes in classes (some static and final values) but still have no use, it won’t reflect at all in the main application.

Well its not always that you did something wrong, it may be effect of default optimization done by “javac”, which you may not be aware of.

For Example:
Say you have a constants class, let one of the constants be DefaultArraySize as below

package mypack;
public class MyConstants{
/* some code here */
public static final int DefaultArraySize=10;
/* some code here */
}

And you are using it in some class say MyApp as following

import mypack.MyConstants;
public class MyApp{
public static void main (String [] args){
/* some code here */
String array[]=new String[MyConstants.DefaultArraySize];
System.out.println(“Array Size: ”+array.length());
}
}

Now compile and run both classes the output will be 10, here the problem starts try to change value of DefaultArraySize to something else say 20, compile MyConstants class again, now try to run MyApp class, so what’s the result 20 – well no its 10 again!!!

Project Manager: So what went wrong?
Intelligent Developer: Don’t know why, JVM is behaves strange sometimes. S sometimes just rebuilding whole application works.

Clearly our too intelligent code guru doesn’t know the exact problem, but anyways this will not stop him as just rebuilding of app will do the trick, but this is just a small problem think how many things can be there, which can go just unnoticed!!!

Frustrated Developer: Why the hell they wrote a Java compiler is such a way? Do they have personal problem with me?
Answer: Well Sun didn’t wrote Javac in such way to trouble our code guru, it’s something they optimized to give you better performance, its not at the cost of developing, its just because our developer is not coding in smart way.

So now we are at a stage to discuss optimizations done by Javac (we will discuss about most common optimizations done by Javac), which can be worthwhile to keep in mind while coding.

But before we go on to talk about optimizations done by Javac, we should keep it in mind that Javac is not an full power optimization engine, it provide some common optimizations only, so we can’t fully depend on Javac alone if at all we plan to optimize our code automatically later. (See my blog for Java Performance Tuning to get more information about why optimization required, how to optimizing your code, and some off the shelf Java optimizing compilers)

Optimizations done by Javac

  1. Constant Removal
  2. Inlineing function calls
  3. Changing/Removing code

Constant Removal: Javac removes a constant if it is involved in some calculation with other constant and returns a single value. For Example-

Original Code: int intVar=10*9;
Optimized by compiler: int intVar=90;

Original Code: String str=”Some Text: “+ (9*10);
Optimized by compiler: String str=”Some Text: 90”;

Original Code: String str=”Some Text: “+ Constnats.someConst; //say its value is 140
Optimized by compiler: String str=”Some Text: 140”;

Inlining function calls: Function calls to private members of the same class are inlined also calls to static members of other classes are inlined.

Changing/Removing code: Javac sometimes alters the code flow to optimize the performance. For Example -

Suppose you built a customized logger, to log all the messages, and say you are having a static variable as
public static final boolean Debug=true;

And you are using it as

If (MyConstants.Debug){
System.out.println(“some debug data”);
}

You compiled this code, and now you changed the value of Debug to false what will happen, now as you are smart enough the constant removal (see 1) will change the code to if condition with true value and the changes will not be affected, yes you are correct. But Javac goes to one more step and removes the whole condition from there, thus the compiled byte code will have code simmilar to what will be produced if below code is compiled

System.out.println(“some debug data”);

That is “if (true)” is removed so that JVM needs to run one less code of line and thus enhancing the performance a bit more.

With this I would like to close this Blog with hope it helped you all.

There are also many other optimizations that are used by commercial Optimizing compilers; you can refer to my Java Performance Tuning Blog for more details.

Code Optimization Techniques

“Premature optimization is root of all evils.” - Donald Knuth.

Thus one should first find out if she really wants to optimize her code, and usually there are more reasons to decide not to optimize the code.
The common reasons why one should not optimize are below

  • Optimization can introduce more bugs.
  • Optimization will surely reduce understandability and readability of code.
  • Optimization will make it difficult to maintain and extend the code.
  • Time spent in optimization may not be that much effective in overall performance of the application.
  • At last optimizing code may minimize portability of code.

But if optimization is evil why we need it?
Most of the time code consumes lot of resources such as memory, CPU cycles, bandwidth etc, thus minimizing overall performance.
We have following options to achieve this target

  1. Using more efficient algorithm.
  2. Using Manual optimization.
  3. Using optimizing compilers.
  4. Using JIT or adaptive compilers.

Using More Efficient Algorithm:
The most common example would be "sum of n natural numbers"
Instead of using
for (int i=0; i<n; i++){
sum += i;
}
We can use
sum = n*(n+1)/2;
But usually we would be using the best algorithm available or might be this is too late to change the algorithm.
Point to remember: It’s better to calculate performance of any algorithm at initial stage of any implementation.


Using Manual Optimization:
As its clear from the name, optimizations are done manually and needs a lot of man power, thus the best way to plan and carry out optimization is to include optimization early in product development and using these while coding, instead of altering completed code/modules.

1. General Optimization Techniques
These techniques are general and language independent, these are simple techniques witch is useful in achieving optimization by just simple restructuring or modification in the code.


Strength Reduction: It’s a method of replacing an operation with an equivalent but more efficient operation. For example – using shift operation instead of multiplication and division, that is using x>>1 instead of x/2 and x<<1.
Common Sub-Expression Removal: Removing redundant calculations.
For example
float surfaceArea=6*lengt*width;
float volume=length*width*height;
Can be changed in
float area=length*width;
float surfaceArea=6*area;
float volume=area*height;



Code Motion: Moving the code, witch yields a constant output, out of loop/block where the calculation is done, so that it is calculated only once.
For Example
for (int r=1;r<10;r++){
totalarea=(22/7)*r*r;
}
Can be changed in
double pi=22/7;
for (int r=1;r<10;r++){
totalarea=pi*r*r;
}

Unrolling Loops: Doing more calculations in single iteration and reducing the number of iterations required. Good amount of optimization can be achieved through loop unrolling.
For example - if we know that length of the array is always multiple of 5 than
for (int i=1;i<10;i++){
someArray[i]=i;
}
Can be changed in
for (int i=1;i<10;i+=5){
someArray[i]=i;
someArray[i+1]=i+1;
someArray[i+2]=i+2;
someArray[i+3]=i+3;
someArray[i+4]=i+4;
}


2. Advanced Optimization Techniques (Java)
Please See: Optimizing Java Code

3. Using optimizing compilers
There are many commercial optimizing compilers to help, but one should first understand full working of optimizing compiler, and then what else that particular compiler offers and what are the risks involve in using that compiler.
Remember its not a good idea to use optimizing compiler just at the time of delivery, it would be better to use it from the beginning so that you will be able to know issues and limitations of the compiler.
Please See: Default Optimization by Javac Compiler.

4. Using JIT or adaptive compilers
Javac compiles the code in the byte code to enable compile once run anywhere feature, but JVM looses good amount of time because of overheads involved in providing these feature at runtime. JVM has to interpret byte code and convert it to machine command to give us portability, sometimes the thing witch we didn’t wanted at all, suppose the server is going to be only on given platform say Linux.

JIT compilers comes to help us in these situations, these compilers compiles the Java code into machine dependent instruction code thus giving high efficiency.

Adaptive Compiler are yet another breed of compiles based on Java HotSpot technology with concentrate on 80-20 rule and try to find and optimize the 20 percent of code witch is high resource consuming and referenced repetitively, these compilers use transformations to optimize code, they learn from given samples and adapt themselves to produce results for given environmental conditions.