How does Java store Strings and how does substring work internally?

class StringTesting {
    public static void main(String args[])
        String str = "abcd";
        String str1 = new String("abcd");
        String str2 = str.substring(0,2);
        String str3 = str.substring(0,2);
        String str4 = str.substring(0,str.length());
        String str5 = str1.substring(0,2);
        String str6 = str1.substring(0,2);
        String str7 = str1.substring(0,str1.length());

        System.out.println(str2 == str3);
        System.out.println(str == str4);
        System.out.println(str5 == str6);
        System.out.println(str1 == str7);

Here is the output I get on java 1.6.0_27:


Can someone please explain the output. I know Java differentiates between String stored in heap and String stored in String "common pool" (which can be interned). Internally, how is their representation different. How does it change the substring algorithm. Kindly cite book/article/blogs etc. wherever appropriate.


See the comments:

    String str = "abcd";  // new String LITERAL which is interned in the pool
    String str1 = new String("abcd"); // new String, not interned: str1 != str
    String str2 = str.substring(0,2); // new String which is a view on str
    String str3 = str.substring(0,2); // same: str3 != str2
    String str7 = str1.substring(0,str1.length()); // special case: str1 is returned


  • Since Java 7u6, substring returns a new string instead of a view on the original string (but that does not make a difference for that example)
  • Special case when you call str1.substring(0,str1.length()); - see code:

    public String substring(int beginIndex, int endIndex) {
        //some exception checking then
        return ((beginIndex == 0) && (endIndex == value.length)) ? this
                : new String(value, beginIndex, subLen);


What is a view?

Until Java 7u6, a String is basically a char[] that contains the characters of the string with an offset and a count (i.e. the string is composed of count characters starting from the offset position in the char[]).

When calling substring, a new string is created with the same char[] but a different offset / count, to effectively create a view on the original string. (Except when count = length and offset = 0 as explained above).

Since java 7u6, a new char[] is created every time, because there is no more count or offset field in the string class.

Where is the common pool stored exactly?

This is implementation specific. The location of the pool has actually moved in recent versions. In more recent versions, it is stored on the heap.

How is the pool managed?

Main characteristics:

  • String literals are stored in the pool
  • Interned strings are stored in the pool (new String("abc").intern();)
  • When a string S is interned (because it is a literal or because intern() is called), the JVM will return a reference to a string in the pool if there is one that is equals to S (hence "abc" == "abc" should always return true).
  • Strings in the pool can be garbage collected (meaning that an interned string might be removed from the pool at some stage if it becomes full)

String is immutable Object.

String#subString - creates a new String . Source

In code it is [open jdk 6] -

 public String substring(int beginIndex, int endIndex) {
    if (beginIndex < 0) {
        throw new StringIndexOutOfBoundsException(beginIndex);
    if (endIndex > value.length) {
        throw new StringIndexOutOfBoundsException(endIndex);
    int subLen = endIndex - beginIndex;
    if (subLen < 0) {
        throw new StringIndexOutOfBoundsException(subLen);
    return ((beginIndex == 0) && (endIndex == value.length)) ? this
            : new String(value, beginIndex, subLen);

Need Your Help

Emacs - tab-completion of local Python variables

python emacs autocomplete

Is there a good emacs mode that will allow tab-completion of local python variables? I set up ipython.el but it will only tab-complete things in the scope of the interpreter. I'm looking for someth...

5 years later, is there something better than the "Fastest Possible C++ Delegates"?

c++ delegates performance

I know that the topic of "C++ delegates" has been done to death, and both and deeply cover the question.