Scanner class and regex problem

Discussion in 'Java' started by Lee Weiner, Jul 2, 2005.

  1. Lee Weiner

    Lee Weiner Guest

    I teach Java, and we're switching to 1.5.0 next semester. I was thinking
    about using the Scanner class to read data from text files, but I'm having
    a problem specifying a delimiter string.

    The file I'm using for the following example contains two records:

    Weiner@572-6544@57
    Kirby@572-6544@36

    Using the following:

    import java.util.Scanner;
    import java.io.FileNotFoundException;
    import java.io.File;

    public class ScannerFile
    {
    public static void main ( String[] args )
    {
    try
    {
    Scanner scan = new Scanner( new File( "lee.txt" ) );
    scan.useDelimiter( "\\s+" ); //1 or more white space chars
    while( scan.hasNext() )
    {
    System.out.println( "*" + scan.next() + "*" );
    }
    scan.close();
    }
    catch(FileNotFoundException exc)
    {
    System.out.println( "Error - Input file not found. Terminating." );
    System.exit( 1 );
    }
    System.exit(0);
    }
    }

    I get:

    *Weiner@572-6544@57*
    *Kirby@572-6544@36*

    Exactly what I expect, but if I also want to delimit on the "@" signs with

    scan.useDelimiter( "[@\\s+]" );

    I get:

    *Weiner*
    *572-6544*
    *57*
    **
    *Kirby*
    *572-6544*
    *36*
    **

    Can anyone tell me what I'm doing to cause that extra empty token at the
    end of each record? I running under WindowsXP, if that's important.

    Lee Weiner
    lee AT leeweiner DOT org
    Lee Weiner, Jul 2, 2005
    #1
    1. Advertising

  2. Lee Weiner

    Chris Smith Guest

    Lee Weiner <> wrote:
    > Exactly what I expect, but if I also want to delimit on the "@" signs with
    >
    > scan.useDelimiter( "[@\\s+]" );
    >
    > I get:
    >
    > *Weiner*
    > *572-6544*
    > *57*
    > **
    > *Kirby*
    > *572-6544*
    > *36*
    > **
    >
    > Can anyone tell me what I'm doing to cause that extra empty token at the
    > end of each record?


    Yes. Your regular expression is faulty. [@\\s+] means "either an @
    symbol, or a single whitespace character, or a + symbol". None of your
    input contained a + character, but things would have gotten even weirder
    if it had. Perhaps you meant [@\\s]+ or @|(\\s+) (the difference being
    that the first would not produce an empty token between two consecutive
    @ characters, while the second one would).

    Incidentally, when you're reading multiple records, it's far safer to
    separate records first, then parse the record content. A simple error
    in the input file here, rather than being detected, could cause you to
    confuse names for phone numbers for the entire rest of the file and
    store corrupt data that has to be hunted and purged after the error has
    been discovered. That's not good.

    Also incidentally, this question and many others like it should be
    carefully read by the fanatics at Sun who seem to think, and write in
    documentation, that just because a problem can be solved by regular
    expressions, it has to be.

    --
    www.designacourse.com
    The Easiest Way To Train Anyone... Anywhere.

    Chris Smith - Lead Software Developer/Technical Trainer
    MindIQ Corporation
    Chris Smith, Jul 2, 2005
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. mert

    Scanner class

    mert, Jan 16, 2006, in forum: Java
    Replies:
    2
    Views:
    3,752
    Roedy Green
    Jan 16, 2006
  2. Replies:
    1
    Views:
    621
    Lionel B
    Apr 17, 2008
  3. slayer_azure

    Scanner input problem

    slayer_azure, May 25, 2008, in forum: Java
    Replies:
    2
    Views:
    326
    slayer_azure
    May 26, 2008
  4. Replies:
    3
    Views:
    734
    Reedick, Andrew
    Jul 1, 2008
  5. George Sakkis

    Scanner class

    George Sakkis, Dec 1, 2008, in forum: Python
    Replies:
    3
    Views:
    349
    Arnaud Delobelle
    Dec 2, 2008
Loading...

Share This Page