A couple of ways to tokenize a delimited String in Java

  • StringTokenizer:
  • It is a legacy class and its use is discouraged in new code. I don’t like to use this class even if it weren’t a legacy class because it ignores blank tokens. Below code will not output the blank(“”) between A and In and the last blank token after Me.

    		StringTokenizer st = new StringTokenizer("I,Have,A,,In,Me,", ",");
    		while (st.hasMoreTokens()) {
    			System.out.println(st.nextToken());
    		}
    
  • split method:
  • Javadoc suggets split method of String or the java.util.regex package as alternatives. The split method returns the blank tokens in-between two tokens, however, the last blank token(after Me) will stll be missing. We need some extra code to get the last token.

    		final String DELIMITER = ",";
    		String string = "I,Have,A,,In,Me,";
    		for (String value : string.split(DELIMITER)) {
    			System.out.println(value);
    		}
    
    		// Check for last token
    		int lastIndex = string.lastIndexOf(DELIMITER);
    		if (lastIndex == string.length()-1) {
    			System.out.println(string.substring(lastIndex + 1));
    		}
    
  • Write your own tokenizer code:
  • 		String string = "I,Have,A,,In,Me,";
    		final String DELIMITER = ",";
    
    		int i = 0,  j = string.indexOf(DELIMITER);
    
    		// while there are tokens
    		while (j != -1) {
    			System.out.println(string.substring(i, j));
    			i = j + 1;
    			j = string.indexOf(DELIMITER, i);
    		}
    
    		// extract the last token
    		if (i <= string.length()) {
    			System.out.println(string.substring(i));
    		}
    

    I got a little crazy and changed the above code.

    		String string = "I,Have,A,,In,Me,";
    		final String DELIMITER = ",";
    
    		int i = 0, j = -1;
    
    		// while there are tokens
    		while ((j = string.indexOf(DELIMITER, (i = ++j))) != -1) {
    			System.out.println(string.substring(i, j));
    		}
    
    		// extract the last token
    		if (i <= string.length()) {
    			System.out.println(string.substring(i));
    		}
    

    Same thing using for.

    		String string = "I,Have,A,,In,Me,";
    		final String DELIMITER = ",";
    
    		int i = 0;
    
    		// for each token in the string
    		for (int j = -1; (j = string.indexOf(DELIMITER, (i = ++j))) != -1;) {
    			System.out.println(string.substring(i, j));
    		}
    
    		// extract the last token
    		if (i <= string.length()) {
    			System.out.println(string.substring(i));
    		}
    

    The most basic benchmarking – using System.nanoTime() in while loop with 1,000,000 iterations. The average of 5 such runs are below.

    Tokenizer 2.569156437(quicker but doesn’t give the desired output.)
    String.split() 4.567683168(remember the last token?)
    While 2.690626660
    2nd While 2.765533696
    For 2.717799994

    No related posts.

    This entry was posted in Java and tagged , , , , , , , , , , . Bookmark the permalink.

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    *

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

    Notify me of followup comments via e-mail. You can also subscribe without commenting.