Java Recruitment Questions - hashCode() method explained in detail


11 Feb 2020  Michal Fabjanski  14 mins read.

Questions about hashCode() are asked on most interviews (from junior to senior level). In this post, you will find out what this method is, how it works for particular types, how it works for primitives, what you have to watch out for during your implementations. Enjoy reading!

What is hashCode() method

hashCode() is a method that should contain a hash algorithm that calculates a 32-bit integer (hash) from the object data, representing that object.

This hash is used by some collections (e.g. HashMap and HashSet) to store references to objects, so that access to data is very fast. Access time depends on the performance of our hash algorithm, so it is worth paying attention to it!

Do java primitive types have their hashcode()?

Primitive types are not treated the same as ordinary objects. It is not possible to call the hashCode() method on a primitive type. The only option is to use wrappers such as Integer.hashCode(3). Integer.hashCode() returns the value of the int we passed on to the method (in our case 3). Implementation of hashCode from Integer class:

public static int hashCode(int value) {
    return value;    
}

Similar hashcode() functionality exists in the Short and Byte classes. First it casts short/byte to the int and returns its value.
A little different hashCode() is in Long. Since Long is a 64-bit number we have to do some “compression” so that we can return the int value:

         
public static int hashCode(long value) {    
    return (int)(value ^ (value >>> 32));    
}  

The >>> is the right shift bit operation. value >>> 32means that we shift 32 bits to right. It leave only 32 bits - and the rest of the position is completed by 0 (32 rightmost bits are discarded). Then, on shifted value a XOR operation is performed ^.

Float numbers are more complicated. Java convert its floating-point values to an equicalent integer value using Float.floatToIntBits(float value). The method also checks whether the value is NaN (Not-a-number):

     
public static int hashCode(float value) {    
    return floatToIntBits(value);    
}    
  
public static int floatToIntBits(float value) {    
    if (!isNaN(value)) {    
        return floatToRawIntBits(value);    
  }    
    return 0x7fc00000;    
}    
    
public static boolean isNaN(float v) {    
    return (v != v);    
}    

public static native int floatToRawIntBits(float var0);  

Implementation of floatToRawIntBits() depend on the JVM/JDK you are using. For openJDK the source is here.

Double convert its floating-point value to long using doubleToLongBits() then does the same XOR trick as Long.

Does the Java enums have its hashCode() ?

Enums uses Object’s hashCode() . This is the only way to use hashCode() in the enum.

     
public final int hashCode() {    
    return super.hashCode();    
}  

In addition, Enum’s hashcode() is final to prevent developers from creating their own implementations. It is because within one JVM there can exist only one instance of each enum object (enums are singletons). This is enough to ensure that such implementation makes sense and is correct.

hashCode() method in Java arrays

Arrays do not provide its hashCode() implementation. Just as enums use a method from the Object class.
It can cause a lot of problems because this hash depends on the magic described below. A better practice is to use hashCode() from the helper class - Arrays. Arrays.hashCode() method returns a hash code based on the contents of the specified array. If the contents of the two arrays are equal then return the same value. You can see an example below:

     
int [] numbers1 = new int[]{1,2,3,4,5,6};    
int [] numbers2 = new int[]{1,2,3,4,5,6};    
    
System.out.println(numbers1.hashCode()); // return 1595428806    
System.out.println(numbers2.hashCode()); // return 1072408673    
    
System.out.println(Arrays.hashCode(numbers1)); // return 918073252    
System.out.println(Arrays.hashCode(numbers2)); // return 918073252  
    

Arrays class has several hashCode() methods - for primitive types (e.g. hashCode(float a[]), hashCode(int a[])), which inside use type specific hashcode() from the wrapper classes. For other Objects it use:

    
public static int hashCode(Object a[]) {    
    if (a == null)    
        return 0;    
    
 int result = 1;    
    
 for (Object element : a)    
        result = 31 * result + (element == null ? 0 : element.hashCode());    
    
 return result;    
}  

As you can see, this method adds all the hash from all elements by multiplying them by a multiplier - which in this case is 31. You may be wondering why we need a multiplier of 31. Thanks to the multiplier, arrays with the same elements but different order has different hashcodes. For example, having a table with the elements [10,20] and [20,10] - without a multiplier, hashcode for the first table would be: 1 + 10 + 20 = 30, and for the second table: 1 + 20 + 10 = 30. Both tables would have the same hashcode (30) even though the order of their elements is different. With a multiplier, their hashcode is - for the first array: (31*1 + 10) *31+20 = 1291 and for the second array: (31*1 + 20) *31+10 = 1591

If you are interested in why the multiplier is 31 - this is described in detail in the article. I recommend reading it!

hashCode() method in List, Map, Set

ArrayList and LinkedList use the same algorithm as Arrays.hashCode() for the same reason. Set, unlike List, uses hashcode() without multiplier:

    
public int hashCode() {    
    int h = 0;    
  Iterator<E> i = iterator();    
 while (i.hasNext()) {    
        E obj = i.next();    
 if (obj != null)    
            h += obj.hashCode();    
  }    
    return h;    
}  

In Set a multiplier is not needed - as the order of the elements does not matter (it is the same). This makes the hashCode calculation a bit faster.

In Maps, hashCode adds all element hashes (element hash is xor from the key hash and hash values).

hashCode() method in String

This is String class hashCode() implementation:

    
public int hashCode() {    
 if (h == 0 && !hashIsZero) {    
        h = isLatin1() ? StringLatin1.hashCode(value)    
                       : StringUTF16.hashCode(value);    
 if (h == 0) {    
            hashIsZero = true;    
  } else {    
            hash = h;    
  }    
    }    
    return h;    
}  

If the string has Latin encoding, the StringLatin1.hashCode method is used, and if the string is UTF-16 encoding, the hashcode will be generated in tringUTF16.hashCode. The difference is that in latin1 each character is exactly one byte long. In utf8 a character can consist of more than one byte. Inside, the hashCode calculating functionality is the same as in List or Arrays.hashCode().
The advantage of the hashCode from the String class is that the hashcode value is cached and calculated lazily during the first call of the method.

hashCode() method in java Objects

If your class hasn’t overridden the hashCode() method, then it will use the one defined in its super class, probably in the Object class. hashCode() in Object class is a native function (is dependant on the JVM). Documentation says :

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)

It is hard to guess how exactly hashCode() works, as its implementation is placed in the native code of the JVM. In many places, you can find information that hashCode() is simply an address in memory. Given that the JVM can relocate objects (e.g. during garbage collection cycles due to promotion or compaction) - after we calculate an object’s identity hash we must be able to retain it in a way that survives object relocation.
One solution is to write the hashcode calculated during the first call of the hashCode() method into the object data (e.g object header). This way, if the object is moved to another memory area, it will still have the same, original hashcode. But despite the fact that it is often believed that - this is not the functionality of the most popular JVM implementation - OpenJDK…

How is the hashCode calculated? hashCode() JVM implementation

If we go deeper, to the JVM code (the implementation can be found here - line 511) we can deduce that the hashCode is not calculated from the address in the memory but from the thread state! You can find proof of that on StackOverflow.
There are two important things to remember. First of all, such hashCode calculating exist from OpenJDK 8. Secondly, hashCode calculating is implementation-dependent. In this post we checked OpenJDK. For comparison - JVM Zing implementation calculates hashCode based on a memory address.
This can be considered a tricky recruitment question! :)

hashCode() own implementations

Finally, it is important to mention the good practices when creating your own hashCode().
I will share with you the 3 most common ways to create your own hashCode():

  1. Using IDE (e.g. generating in IntelliJ, you can press ALT + Insert and select “equals and hashCode” from the context menu).
  2. Using libraries. For example, Lombok library, which will generate hashCode() during compilation.
  3. Writing your own implementation of hashCode() - you must be careful not to make a mistake. A wrong hashCode() can spoil the effectiveness of HashMap and bring many problems!

A simple example! Everyone likes simple examples! We have a well-known Person class:

    
public class Person {

    private String name;
    private String surname;

    public Person(String name, String surname) {
        this.name = name;
        this.surname = surname;
    }

    @Override
    public int hashCode() {
        return Objects.hash(name, surname);
    }
}  

Remember that when overwriting the hashCode() method you must also overwrite the equals() method, so as not to break the hashCode() & equals() contract. The equals and hashCode methods should meet the following conditions:
If x == y then x.equals(y) == true
If x.equals(y) == true then x.hashCode() == y.hashCode()
If x.hashCode() == y.hashCode() then x.equals(y) can either return true or return false
The last point is based on int scope (hashCode() returns int value). As a result, hashCode() can return a finite number of results: from -2 147 483 648 to +2 147 483 647 - so, around 4 billion unique values. Let’s imagine, that we are building the application, that keeps more data. If we start invoking the hashCode() method from more than 4 billion elements, it is possible, that first 4 billion ones will produce a unique result, but each next call will produce the integer value, that has been produced before. Therefore, it is important to combine hashCode() with equals() to make sure that objects that have the same hashcodes are not different.

Additional hashCode() doubts

Do I have to implement hashCode/equals in each class?

Answer: No, if you don’t expect to use the equals method (e.g., aren’t using assertEquals from JUnit, or never mean to use this class as a key in a Map, etc), the implementation will not be used -> it not necessary.

What will be the effect of badly implemented hashCode()

Anwer: Even if in hashCode you will return the same int for each element, e.g:

    
@Override    
public int hashCode() {    
    return 123;    
}  

Your program will compile and run. The problem can be when you want to use collections using hashCode (HashMap, HashSet or other). In the above example hashCode - all elements will be placed into the same bucket in HashMap (as LinkedList elements). Effectiveness will decrease significantly.

Does null have its hashCode?

Of course you can’t call the hashCode() method (like any other method) on null value. But we can pass null to Objects.hashCode():

    
public static int hashCode(Object o) {  
    return o != null ? o.hashCode() : 0;  
}  

As you can see, Objects.hashCode() for null will return 0. You’re probably wondering how HashMap’s doing in that case. HashMap has a special handling for null keys. null values are fine as you don’t compute hash code for them in a HashMap. This is the reason why null keys work fine in HashMap.

Summary

I hope you liked the article. If you have any questions, please feel free to comment. Thank you for your comments under the last post :)