Monday, September 19, 2011

Java Hashtable

Hashtable is an implementation of a key-value pair data structure in java. You can store and retrieve a ‘value’ using a ‘key’ and it is an identifier of the value stored. It is obvious that the ‘key’ should be unique.
java.util.Hashtable extends Dictionary and implements Map. Objects with non-null value can be used as a key or value. Key of the Hashtable must implement hashcode() and equals() methods. By the end of this article you will find out the reason behind this condition.

Generally a Hashtable in java is created using the empty constructor Hashtable(). Which is a poor decision and an often repeated mistake. Hashtable has two other constructors

Hashtable(int initialCapacity)

and

Hashtable(int initialCapacity, float loadFactor)

. Initial capacity is number of buckets created at the time of Hashtable instantiation. Bucket is a logical space of storage for Hashtable.

Hashing and Hashtable

Before seeing java’s Hashtable in detail you should understand hashing in general. Assume that, v is a value to be stored and k is the key used for storage / retrieval, then h is a hash function where v is stored at h(k) of table. To retrieve a value compute h(k) so that you can directly get the position of v. So in a key-value pair table, you need not sequentially scan through the keys to identify a value.
h(k) is the hashing function and it is used to find the location to store the corresponding value v. h(k) cannot compute to a indefinite space. Storage allocated for a Hashtable is limited within a program. So, the hasing function h(k) should return a number within that allocated spectrum (logical address space).

Hashing in Java

Java’s hashing uses uses hashCode() method from the key and value objects to compute. Following is the core code from Hashtable where the hashCode ‘h’ is computed. You can see that both key’s and value’s hashCode() method is called.

h += e.key.hashCode() ^ e.value.hashCode();

It is better to have your hashCode() method in your custom objects. String has its own hashCode methode and it computes the hashcode value as below:

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

If you don’t have a hashCode() method, then it is derived from Object class. Following is javadoc comment of hashCode() method from Object class:

* Returns a hash code value for the object. This method is
   * supported for the benefit of hashtables such as those provided by
   * <code>java.util.Hashtable</code>.

If you are going to write a custom hashCode(), then follow the following contract:

* The general contract of <code>hashCode</code> is:
    * <ul>
    * <li>Whenever it is invoked on the same object more than once during
    *     an execution of a Java application, the <tt>hashCode</tt> method
    *     must consistently return the same integer, provided no information
    *     used in <tt>equals</tt> comparisons on the object is modified.

The following is to improve performance of the Hashtable.

* <li>If two objects are equal according to the <tt>equals(Object)</tt>
   *     method, then calling the <code>hashCode</code> method on each of
   *     the two objects must produce the same integer result.

hashCode() guarantees distinct integers by using the internal address of the object.

Collision in Hashtable

When we try to restrict the hashing function’s output within the allocated address spectrue limit, there is a possibility of a collision. For two different keys k1 and k2, if we have h(k1) = h(k2), then this is called collision in hashtable. What does this mean, our hashing function directs us store two different values (keys are also different) in the same location.
When we have a collision, there are multiple methodologies available to resolve it. To name a few hashtable collision resolution technique, ‘separate chaining’, ‘open addressing’, ‘robin hood hashing’, ‘cuckoo hashing’, etc. Java’s hashtable uses ‘separate chaining’ for collision resolution in Hashtable.

Collision Resolution in java’s Hashtable

Java uses separate chaining for collision resolution. Recall a point that Hashtable stores elements in buckets. In separate chaining, every bucket will store a reference to a linked list. Now assume that you have stored an element in bucket 1. That means, in bucket 1 you will have a reference to a linked list and in that linked list you will have two cells. In those two cells you will have key and its corresponding value. Hashtable Collision

Why do you want to store the key? Because when there is a collision i.e., when two keys results in same hashcode and directs to the same bucket (assume bucket 1) you want to store the second element also in the same bucket. You add this second element to the already created linked list as the adjacent element.
Now when you retrieve a value it will compute the hash code and direct you to a bucket which has two elements. You scan those two elements alone sequentially and compare the keys using their equals() method. When the key mathches you get the respective value. Hope you have got the reason behind the condition that your object must have hashCode() and equals() method.
Java has a private static class Entry inside Hashtable. It is an implementation of a list and you can see there, it stores both the key and value.

Hashtable performance

To get better performance from your java Hashtable, you need to
1) use the initialCapacity and loadFactor arguments
2) use them wisely
while instantiating a Hashtable.
initialCapacitiy is the number of buckets to be created at the time of Hashtable instantiation. The number of buckets and probability of collision is inversly proportional. If you have more number of buckets than needed then you have lesser possibility for a collision.
For example, if you are going to store 10 elements and if you are going to have initialCapacity as 100 then you will have 100 buckets. You are going to calculate hashCoe() only 10 times with a spectrum of 100 buckets. The possibility of a collision is very very less.
But if you are going to supply initialCapacity for the Hashtable as 10, then the possibility of collision is very large. loadFactor decides when to automatically increase the size of the Hashtable. The default size of initialCapacity is 11 and loadFactor is .75 That if the Hashtable is 3/4 th full then the size of the Hashtable is increased.
New capacity in java Hashtable is calculated as follows:

int newCapacity = oldCapacity * 2 + 1;

If you give a lesser capacity and loadfactor and often it does the rehash() which will cause you performance issues. Therefore for efficient performance for Hashtable in java, give initialCapacity as 25% extra than you need and loadFactor as 0.75 when you instantiate.

Difference between String and StringBuffer/StringBuilder in Java

Well, the most important difference between String and StringBuffer/StringBuilder in java is that String object is immutable whereas StringBuffer/StringBuilder objects are mutable.
By immutable, we mean that the value stored in the String object cannot be changed. Then the next question that comes to our mind is “If String is immutable then how am I able to change the contents of the object whenever I wish to?” . Well, to be precise it’s not the same String object that reflects the changes you do. Internally a new String object is created to do the changes.
So suppose you declare a String object:
String myString = “Hello”;
Next, you want to append “Guest” to the same String. What do you do?
myString = myString + ” Guest”;
When you print the contents of myString the output will be “Hello Guest”. Although we made use of the same object(myString), internally a new object was created in the process. So, if you were to do some string operation involving an append or trim or some other method call to modify your string object, you would really be creating those many new objects of class String.
Now isn’t that a performance issue?
Yes, it definitely is.
Then how do you make your string operations efficient?
By using StringBuffer or StringBuilder.
How would that help?
Well, since StringBuffer/StringBuilder objects are mutable, we can make changes to the value stored in the object. What this effectively means is that string operations such as append would be more efficient if performed using StringBuffer/StringBuilder objects than String objects.
Finally, whats the difference between StringBuffer and StringBuilder?
StringBuffer and StringBuilder have the same methods with one difference and that’s of synchronization. StringBuffer is synchronized( which means it is thread safe and hence you can use it when you implement threads for your methods) whereas StringBuilder is not synchronized( which implies it isn’t thread safe).
So, if you aren’t going to use threading then use the StringBuilder class as it’ll be more efficient than StringBuffer due to the absence of synchronization.
Incase you do not know – Here’s how you use StringBuilder
A simple Example to demonstrate that String object is Immutable
Incase you still have any doubts regarding String or StringBuilder then do leave a comment. I’ll be more than eager to help you out.
Note: StringBuilder was introduced in Java 1.5 (so if you happen to use versions 1.4 or below you’ll have to use StringBuffer)

Java String Concatenation

ref:http://javapapers.com/core-java/java-string-concatenation/

You have been told many times, don’t use + (java plus operator) to concatenate Strings. We all know that it is not good for performance. Have you researched it? Do you know what is happening behind the hood? Lets explore all about String concatenation now.
In the initial ages of java around jdk 1.2 every body used + to concatenate two String literals. When I say literal I mean it. Strings are immutable. That is, a String cannot be modified. Then what happens when we do

String fruit = "Apple"; fruit = fruit + "World";

In the above java code snippet for String concatenation, it looks like the String is modified. It is not happening. Until JDK 1.4 StringBuffer is used internally and from JDK 1.5 StringBuilder is used to concatenate. After concatenation the resultant StringBuffer or StringBuilder is changed to String.
When java experts say, “don’t use + but use StringBuffer”. If + is going to use StringBuffer internally what big difference it is going to make in String concatenation? Look at the following example. I have used both + and StringBuffer as two different cases. In case 1, I am just using + to concatenate. In case 2, I am changing the String to StringBuffer and then doing the concatenation. Then finally changing it back to String. I used a timer to record the time taken for an example String concatenation.
Look at the output (if you run this java program the result numbers might slightly vary based on your hardware / software configuration). The difference between the two cases is astonishing.
My argument is, if + is using StringBuffer internally for concatenation, then why is this huge difference in time? Let me explain that, when a + is used for concatenation see how many steps are involved:

A StringBuffer object is created
string1 is copied to the newly created StringBuffer object
The “*” is appended to the StringBuffer (concatenation)
The result is converted to back to a String object.
The string1 reference is made to point at that new String.
The old String that string1 previously referenced is then made null.

Hope you understand the serious performance issues and why it is important to use StringBuffer or StringBuilder (from java 1.5) to concatenate Strings.
Therefore you can see initially it was +, then StringBuffer came and now StringBuilder. Surely Java is improving release by release!

Example Java Source Code For String Concatenation

class Clock {
 
  private final long startTime;
 
  public Clock() {
    startTime = System.currentTimeMillis();
  }
 
  public long getElapsedTime() {
    return System.currentTimeMillis() - startTime;
  }
}
 
public class StringConcatenationExample {
 
  static final int N = 47500;
 
  public static void main(String args[]) {
 
    Clock clock = new Clock();
 
    //String to be used for concatenation
    String string1 = "";
    for (int i = 1; i <= N; i++) {
 
      //String concatenation using +
      string1 = string1 + "*";
    }
    //Recording the time taken to concatenate
    System.out.println("Using + Elapsed time: " + clock.getElapsedTime());
 
    clock = new Clock();
    StringBuffer stringBuffer = new StringBuffer();
    for (int i = 1; i <= N; i++) {
 
      //String concatenation using StringBuffer
      stringBuffer.append("*");
    }
    String string2 = stringBuffer.toString();
    System.out.println("Using StringBuffer Elapsed time: " + clock.getElapsedTime());
 
  }
}

Output For The Above Example Program For String Concatenation

Using + Elapsed time: 3687

Using StringBuffer Elapsed time: 16

All About Java

Pages