Skip to Content

A couple of ways to tokenize a delimited String in Java

StringTokenizer:

It is a legacy class and its use is discouraged in new code. I don’t like to use this class even if it weren’t a legacy class because it ignores blank tokens. Below code will not output the blank(“”) between A and In and the last blank token after Me.

StringTokenizer st = new StringTokenizer("I,Have,A,,In,Me,", ",");
while (st.hasMoreTokens()) {
	System.out.println(st.nextToken());
}

split method:

Javadoc suggests split method of String or the java.util.regex package as alternatives. The split method returns the blank tokens in-between two tokens; however, the last blank token(after Me) will still be missing. We need some extra code to get the last token.

Update: I completely missed the overloaded split method - string.split(DELIMITER,-1) will work just fine. I’m keeping the rest of the post below as is.

final String DELIMITER = ",";
String string = "I,Have,A,,In,Me,";
for (String value : string.split(DELIMITER)) {
	System.out.println(value);
}

// Check for last token
int lastIndex = string.lastIndexOf(DELIMITER);
if (lastIndex == string.length()-1) {
	System.out.println(string.substring(lastIndex + 1));
}

Write your own tokenizer code:

String string = "I,Have,A,,In,Me,";
final String DELIMITER = ",";

int i = 0,  j = string.indexOf(DELIMITER);

// while there are tokens
while (j != -1) {
	System.out.println(string.substring(i, j));
	i = j + 1;
	j = string.indexOf(DELIMITER, i);
}

// extract the last token
if (i 			System.out.println(string.substring(i));
}

I got a little crazy and changed the above code.

String string = "I,Have,A,,In,Me,";
final String DELIMITER = ",";

int i = 0, j = -1;

// while there are tokens
while ((j = string.indexOf(DELIMITER, (i = ++j))) != -1) {
	System.out.println(string.substring(i, j));
}

// extract the last token
System.out.println(string.substring(i));
}

Same thing using for.

String string = "I,Have,A,,In,Me,";
final String DELIMITER = ",";

int i = 0;

// for each token in the string
for (int j = -1; (j = string.indexOf(DELIMITER, (i = ++j))) != -1;) {
	System.out.println(string.substring(i, j));
}

// extract the last token
System.out.println(string.substring(i));
}

The most basic benchmarking - using System.nanoTime() in while loop with 1,000,000 iterations. The average of 5 such runs are below.

Method average runtime
Tokenizer 2.569156437 (quicker but doesn’t give the desired output.)
String.split() 4.567683168 (remember the last token?)
While 2.690626660
2nd While 2.765533696
For 2.717799994
Categories: 1
Tags:

Thank You!

Your comment has been submitted. It will appear on this page shortly! OK

Yikes, Sorry!

Error occured. Couldn't submit your comment. Please try again. Thank You! OK

Leave a comment