A couple of ways to tokenize a delimited String in Java
StringTokenizer:
It is a legacy class and its use is discouraged in new code. I don’t like to use this class even if it weren’t a legacy class because it ignores blank tokens. Below code will not output the blank("") between A and In and the last blank token after Me.
StringTokenizer st = new StringTokenizer("I,Have,A,,In,Me,", ",");
while (st.hasMoreTokens()) {
System.out.println(st.nextToken());
}split method:
Javadoc suggests split method of String or the java.util.regex package as alternatives. The split method returns the blank tokens in-between two tokens; however, the last blank token(after Me) will still be missing. We need some extra code to get the last token.
Update: I completely missed the overloaded split method - string.split(DELIMITER,-1) will work just fine. I’m keeping the rest of the post below as is.
final String DELIMITER = ",";
String string = "I,Have,A,,In,Me,";
for (String value : string.split(DELIMITER)) {
System.out.println(value);
}
// Check for last token
int lastIndex = string.lastIndexOf(DELIMITER);
if (lastIndex == string.length()-1) {
System.out.println(string.substring(lastIndex + 1));
}Write your own tokenizer code:
String string = "I,Have,A,,In,Me,";
final String DELIMITER = ",";
int i = 0, j = string.indexOf(DELIMITER);
// while there are tokens
while (j != -1) {
System.out.println(string.substring(i, j));
i = j + 1;
j = string.indexOf(DELIMITER, i);
}
// extract the last token
if (i System.out.println(string.substring(i));
}I got a little crazy and changed the above code.
String string = "I,Have,A,,In,Me,";
final String DELIMITER = ",";
int i = 0, j = -1;
// while there are tokens
while ((j = string.indexOf(DELIMITER, (i = ++j))) != -1) {
System.out.println(string.substring(i, j));
}
// extract the last token
System.out.println(string.substring(i));
}Same thing using for.
String string = "I,Have,A,,In,Me,";
final String DELIMITER = ",";
int i = 0;
// for each token in the string
for (int j = -1; (j = string.indexOf(DELIMITER, (i = ++j))) != -1;) {
System.out.println(string.substring(i, j));
}
// extract the last token
System.out.println(string.substring(i));
}The most basic benchmarking - using System.nanoTime() in while loop with 1,000,000 iterations. The average of 5 such runs are below.
| Method | average runtime |
|---|---|
| Tokenizer | 2.569156437 (quicker but doesn’t give the desired output.) |
| String.split() | 4.567683168 (remember the last token?) |
| While | 2.690626660 |
| 2nd While | 2.765533696 |
| For | 2.717799994 |
Thank You!
Your comment has been submitted. It will appear on this page shortly! OKYikes, Sorry!
Error occured. Couldn't submit your comment. Please try again. Thank You! OK