Skip to Content

A couple of ways to tokenize a delimited String in Java

StringTokenizer:

It is a legacy class and its use is discouraged in new code. I don’t like to use this class even if it weren’t a legacy class because it ignores blank tokens. Below code will not output the blank("") between A and In and the last blank token after Me.

StringTokenizer st = new StringTokenizer("I,Have,A,,In,Me,", ",");
while (st.hasMoreTokens()) {
	System.out.println(st.nextToken());
}

split method:

Javadoc suggests split method of String or the java.util.regex package as alternatives. The split method returns the blank tokens in-between two tokens; however, the last blank token(after Me) will still be missing. We need some extra code to get the last token.

Update: I completely missed the overloaded split method - string.split(DELIMITER,-1) will work just fine. I’m keeping the rest of the post below as is.

final String DELIMITER = ",";
String string = "I,Have,A,,In,Me,";
for (String value : string.split(DELIMITER)) {
	System.out.println(value);
}

// Check for last token
int lastIndex = string.lastIndexOf(DELIMITER);
if (lastIndex == string.length()-1) {
	System.out.println(string.substring(lastIndex + 1));
}

Write your own tokenizer code:

String string = "I,Have,A,,In,Me,";
final String DELIMITER = ",";

int i = 0,  j = string.indexOf(DELIMITER);

// while there are tokens
while (j != -1) {
	System.out.println(string.substring(i, j));
	i = j + 1;
	j = string.indexOf(DELIMITER, i);
}

// extract the last token
if (i 			System.out.println(string.substring(i));
}

I got a little crazy and changed the above code.

String string = "I,Have,A,,In,Me,";
final String DELIMITER = ",";

int i = 0, j = -1;

// while there are tokens
while ((j = string.indexOf(DELIMITER, (i = ++j))) != -1) {
	System.out.println(string.substring(i, j));
}

// extract the last token
System.out.println(string.substring(i));
}

Same thing using for.

String string = "I,Have,A,,In,Me,";
final String DELIMITER = ",";

int i = 0;

// for each token in the string
for (int j = -1; (j = string.indexOf(DELIMITER, (i = ++j))) != -1;) {
	System.out.println(string.substring(i, j));
}

// extract the last token
System.out.println(string.substring(i));
}

The most basic benchmarking - using System.nanoTime() in while loop with 1,000,000 iterations. The average of 5 such runs are below.

Method average runtime
Tokenizer 2.569156437 (quicker but doesn’t give the desired output.)
String.split() 4.567683168 (remember the last token?)
While 2.690626660
2nd While 2.765533696
For 2.717799994

Thank You!

Your comment has been submitted. It will appear on this page shortly! OK

Yikes, Sorry!

Error occured. Couldn't submit your comment. Please try again. Thank You! OK

Leave a comment