better way to parse Tabs

Discussion in 'Java' started by MileHighCelt, Dec 6, 2005.

  1. MileHighCelt

    MileHighCelt Guest

    I have been looking thru use groups and API's and there seems to be
    some issue on parsing a tab delimited file. If a user uploads a file
    that is tab delimited, and could contain nulls, is this necessarily the
    best approach:

    InputStream in = theFile.getInputStream();
    BufferedReader r = new BufferedReader(new InputStreamReader(in));
    String line;
    while ((line = r.readLine()) != null) {
    StringTokenizer st = new StringTokenizer(line, "\t", true);
    MyBean region = new MyBean();

    region.setLab_id(Integer.parseInt(st.nextToken()));
    region.setControl_nmbr(st.nextToken());
    ...
    }

    As I understand it, the "true" arguement in the tokenizer will handle
    the nulls/no value as empty strings?

    Is there a better way to solve this? Therre are probably 50 columns
    per row, with a variety of date, int, and String values.

    Thank you for you opinion and advice.
     
    MileHighCelt, Dec 6, 2005
    #1
    1. Advertising

  2. > As I understand it, the "true" arguement in the tokenizer will handle
    > the nulls/no value as empty strings?


    No. If you read carefully the documentation, you will see that "true"
    means that it returns the delimiters (in your case "\t") as tokens.
    So if you want use a StringTokenizer you'll have to allways check for
    the "\t" token and also take care of sequential "\t" tokens.

    > Is there a better way to solve this?


    Yes. You could do (JVM 1.4+):

    String[] elements = line.split("\t");

    In the returning string array, the empty elements (\t\t) will be empty
    strings.
    All you'll have to do is convert to Date or int the relevant elements,
    allways verifying first for empty string of course).

    region.setLab_id(convertToInt(elements[0]));
    region.setControl_nmbr(elements[1]); // no conversion needed
    ....
    region.setSomeDateField(convertToDate(elements[n]));
    ....
    int convertToInt(String str)
    {
    return (str.length() == 0) ? 0 : Integer.parseInt(str);
    }

    Date convertToDate(String str)
    {
    return (str.length() == 0) ? null :
    someSimpleDateFormatInstance.parse(str);
    }

    Regards
     
    Jean-Francois Briere, Dec 7, 2005
    #2
    1. Advertising

  3. MileHighCelt

    MileHighCelt Guest

    Thank you - that is exactly what I was looking for! I will swap out
    the StringTokenizer I implemented as it sure seems slow.
     
    MileHighCelt, Dec 7, 2005
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Roberto A. F. De Almeida

    Which is the better way to parse this file?

    Roberto A. F. De Almeida, Sep 2, 2003, in forum: Python
    Replies:
    2
    Views:
    331
    Roberto A. F. De Almeida
    Sep 2, 2003
  2. qwweeeit
    Replies:
    2
    Views:
    650
    qwweeeit
    Dec 14, 2005
  3. rantingrick

    Tabs -vs- Spaces: Tabs should have won.

    rantingrick, Jul 16, 2011, in forum: Python
    Replies:
    95
    Views:
    1,839
    Roy Smith
    Jul 19, 2011
  4. John Kopanas
    Replies:
    2
    Views:
    295
    Gregory Brown
    Jan 29, 2007
  5. Replies:
    2
    Views:
    56
    Mark H Harris
    May 13, 2014
Loading...

Share This Page