Complete Guide to Java String Substring
In this article, we are going to shed light on Java String substring() method.
String in Java is a class that represents a sequence of characters. However, sometimes we are not interested in all characters. We need just to manipulate some of them to solve one problem or another.
So, this is where the substring() method comes to the rescue.
What is a substring?
A substring, as the name implies, is a contiguous sequence of characters within a particular string.
In short, every subset or portion of an original string is called a substring.
For example, “devwithus” is composed of the following substrings: “dev”, “with”, “us”…
Now that we know what a substring is, let’s see how to create and manipulate it.
Fortunately, the String class provides a handy method called substring() especially to get a substring.
So, let’s dig deep and take a close look at it.
String.substring() Syntax Variants in Java
Basically, there are two overloaded variants of the substring() method:
public String substring(int startIndex): this version accepts only one parameter
public String substring(int startIndex, int endIndex): this variant accepts two parameters
For the method’s arguments, we have:
beginIndex: describes the starting index, it’s inclusive for both variants
endIndex: denotes the last index and it’s exclusive
Please bear in mind that both methods are susceptible to throw IndexOutOfBoundsException.
When substring() Throws IndexOutOfBoundsException?
The beginIndex should be greater than zero and less than the length of the string.
Also, endIndex should be greater than or equal to beginIndex and less than the length of the given string.
Otherwise the substring() method will simply throws IndexOutOfBoundsException.
substring(int beginIndex) Example
The extracted portion starts with the character at the index beginIndex till the end of the string.
Let’s see how this method is implemented in Java:
public String substring(int beginIndex) {
if (beginIndex < 0) {
throw new StringIndexOutOfBoundsException(beginIndex);
}
int subLen = length() - beginIndex;
if (subLen < 0) {
throw new StringIndexOutOfBoundsException(subLen);
}
if (beginIndex == 0) {
return this;
}
return isLatin1() ? StringLatin1.newString(value, beginIndex, subLen)
: StringUTF16.newString(value, beginIndex, subLen);
}
For instance, let’s see how we can use substring(int beginIndex) to find some of the subregions of a given string:
public class SubstringsOfString {
public SubstringsOfString() {
}
public static void main(String[] args) {
String mystr = System."devwithus.com";
System.out.println(mystr.substring(2));
System.out.println(mystr.substring(5));
System.out.println(mystr.substring(7));
}
}
Program output:
vwithus.com
thus.com
us.com
substring(int beginIndex, int endIndex) Example
Typically, this overloaded method accepts two parameters and produces a new string that is a portion of the given string.
The returned substring begins with the character at the beginIndex (included) and ends with the one at the specified endIndex (excluded).
This variant of the substring() method is implemented as follows:
public String substring(int beginIndex, int endIndex) {
int length = length();
checkBoundsBeginEnd(beginIndex, endIndex, length);
int subLen = endIndex - beginIndex;
if (beginIndex == 0 && endIndex == length) {
return this;
}
return isLatin1() ? StringLatin1.newString(value, beginIndex, subLen)
: StringUTF16.newString(value, beginIndex, subLen);
}
private static void checkBoundsBeginEnd(int begin, int end, int length) {
if (begin < 0 || begin > end || end > length) {
throw new StringIndexOutOfBoundsException(
"begin " + begin + ", end " + end + ", length " + length);
}
}
Now, let’s demonstrate how to extract substrings from a particular string using the substring(int beginIndex, int endIndex) method:
public class SubstringsOfString2 {
public SubstringsOfString2() {
}
public static void main(String[] args) {
String mystr = "devwithus blog";
System.out.println(mystr.substring(1,5));
System.out.println(mystr.substring(5,5));
System.out.println(mystr.substring(7,9));
}
}
The output of the above example program is:
vwithus.com
thus.com
us.com
How substring() is Implemented Internally in Java?
The behavior of the substring method is not the same in Java 6 and Java 7.
The way how it is internally implemented in pre-Java 7 is different from the way how it is defined in post-Java 7.
Let’s take a look at how it is implemented in each Java version.
Before Java 7
Well, string is basically a char array that contains all the characters.
So, to manage the array, String defines internally two variables:
offset: The first index used to store the string
count: The number of characters present in the string
// private constructor
String(int offset, int count, char value[]) {
this.value = value;
this.offset = offset;
this.count = count;
}
// substring method
public String substring(int beginIndex, int endIndex) {
if (beginIndex < 0) {
throw new StringIndexOutOfBoundsException(beginIndex);
}
if (endIndex > count) {
throw new StringIndexOutOfBoundsException(endIndex);
}
if (beginIndex > endIndex) {
throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
}
return ((beginIndex == 0) && (endIndex == count)) ? this
: new String(offset + beginIndex, endIndex - beginIndex, value);
}
When we call substring on a specified String instance, new values are assigned to offset and count variables to create a new String object without changing the internal char array.
So, if the original string is too long and holds a large array of size 2GB for example. No matter how small the substring is, it will share the same 2GB which can lead to a memory leak issue.
Since Java 7u6
Java 7u6 comes with a new brand way to optimize the logic of extracting substrings from a string.
So, instead of sharing the same underlying char[], substring method creates a new copy.
offset and count fields are removed from String class because they are no longer used.
For further detail abour the topic, please refer to 010257.html.
In Java 7, substring internally invokes Arrays.copyOfRange method to create a new char[] copy:
public String(char value[], int offset, int count) {
if (offset < 0) {
throw new StringIndexOutOfBoundsException(offset);
}
if (count < 0) {
throw new StringIndexOutOfBoundsException(count);
}
if (offset > value.length - count) {
throw new StringIndexOutOfBoundsException(offset + count);
}
this.offset = 0;
this.count = count;
this.value = Arrays.copyOfRange(value, offset, offset+count);
}
Please bear in mind that we have to be careful when using substring as it can lead to serious issues if it is not used with care.
Important Key Points about the substring() Method
The following are some key points we should keep in mind about the substring() method in Java:
It returns a new string instance. Why? Because the String class is immutable in Java
For a string of length n, there are (n*(n+1))/2 non empty substrings
beginIndex is inclusive and endIndex is exclusive
The index 0 refers to the first character of the string. A valid index must be a number between 0 and the length of the string
Blank spaces are also counted when using the substring() method
Conclusion:
That’s all folks. In this article, we have discussed everything you need to know about the substring() method in Java.
If you have come this far, it means that you liked what you are reading. Hope I didn’t string you along too much and that you found this article useful.
Stay tuned and see you next time.